25 research outputs found
Multi-objective evolution for Generalizable Policy Gradient Algorithms
Performance, generalizability, and stability are three Reinforcement Learning
(RL) challenges relevant to many practical applications in which they present
themselves in combination. Still, state-of-the-art RL algorithms fall short
when addressing multiple RL objectives simultaneously and current human-driven
design practices might not be well-suited for multi-objective RL. In this paper
we present MetaPG, an evolutionary method that discovers new RL algorithms
represented as graphs, following a multi-objective search criteria in which
different RL objectives are encoded in separate fitness scores. Our findings
show that, when using a graph-based implementation of Soft Actor-Critic (SAC)
to initialize the population, our method is able to find new algorithms that
improve upon SAC's performance and generalizability by 3% and 17%,
respectively, and reduce instability up to 65%. In addition, we analyze the
graph structure of the best algorithms in the population and offer an
interpretation of specific elements that help trading performance for
generalizability and vice versa. We validate our findings in three different
continuous control tasks: RWRL Cartpole, RWRL Walker, and Gym Pendulum.Comment: 23 pages, 12 figures, 10 table
Evolving Reinforcement Learning Algorithms
We propose a method for meta-learning reinforcement learning algorithms by
searching over the space of computational graphs which compute the loss
function for a value-based model-free RL agent to optimize. The learned
algorithms are domain-agnostic and can generalize to new environments not seen
during training. Our method can both learn from scratch and bootstrap off known
existing algorithms, like DQN, enabling interpretable modifications which
improve performance. Learning from scratch on simple classical control and
gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm.
Bootstrapped from DQN, we highlight two learned algorithms which obtain good
generalization performance over other classical control tasks, gridworld type
tasks, and Atari games. The analysis of the learned algorithm behavior shows
resemblance to recently proposed RL algorithms that address overestimation in
value-based methods.Comment: ICLR 2021 Oral. See project website at
https://sites.google.com/view/evolvingr
Discovering Representations for Black-box Optimization
The encoding of solutions in black-box optimization is a delicate,
handcrafted balance between expressiveness and domain knowledge -- between
exploring a wide variety of solutions, and ensuring that those solutions are
useful. Our main insight is that this process can be automated by generating a
dataset of high-performing solutions with a quality diversity algorithm (here,
MAP-Elites), then learning a representation with a generative model (here, a
Variational Autoencoder) from that dataset. Our second insight is that this
representation can be used to scale quality diversity optimization to higher
dimensions -- but only if we carefully mix solutions generated with the learned
representation and those generated with traditional variation operators. We
demonstrate these capabilities by learning an low-dimensional encoding for the
inverse kinematics of a thousand joint planar arm. The results show that
learned representations make it possible to solve high-dimensional problems
with orders of magnitude fewer evaluations than the standard MAP-Elites, and
that, once solved, the produced encoding can be used for rapid optimization of
novel, but similar, tasks. The presented techniques not only scale up quality
diversity algorithms to high dimensions, but show that black-box optimization
encodings can be automatically learned, rather than hand designed.Comment: Presented at GECCO 2020 -- v2 (Previous title 'Automating
Representation Discovery with MAP-Elites'
Small-scale proxies for large-scale Transformer training instabilities
Teams that have trained large Transformer-based models have reported training
instabilities at large scale that did not appear when training with the same
hyperparameters at smaller scales. Although the causes of such instabilities
are of scientific interest, the amount of resources required to reproduce them
has made investigation difficult. In this work, we seek ways to reproduce and
study training stability and instability at smaller scales. First, we focus on
two sources of training instability described in previous work: the growth of
logits in attention layers (Dehghani et al., 2023) and divergence of the output
logits from the log probabilities (Chowdhery et al., 2022). By measuring the
relationship between learning rate and loss across scales, we show that these
instabilities also appear in small models when training at high learning rates,
and that mitigations previously employed at large scales are equally effective
in this regime. This prompts us to investigate the extent to which other known
optimizer and model interventions influence the sensitivity of the final loss
to changes in the learning rate. To this end, we study methods such as warm-up,
weight decay, and the Param (Yang et al., 2022), and combine techniques to
train small models that achieve similar losses across orders of magnitude of
learning rate variation. Finally, to conclude our exploration we study two
cases where instabilities can be predicted before they emerge by examining the
scaling behavior of model activation and gradient norms
Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research
Simulation is an essential tool to develop and benchmark autonomous vehicle
planning software in a safe and cost-effective manner. However, realistic
simulation requires accurate modeling of nuanced and complex multi-agent
interactive behaviors. To address these challenges, we introduce Waymax, a new
data-driven simulator for autonomous driving in multi-agent scenes, designed
for large-scale simulation and testing. Waymax uses publicly-released,
real-world driving data (e.g., the Waymo Open Motion Dataset) to initialize or
play back a diverse set of multi-agent simulated scenarios. It runs entirely on
hardware accelerators such as TPUs/GPUs and supports in-graph simulation for
training, making it suitable for modern large-scale, distributed machine
learning workflows. To support online training and evaluation, Waymax includes
several learned and hard-coded behavior models that allow for realistic
interaction within simulation. To supplement Waymax, we benchmark a suite of
popular imitation and reinforcement learning algorithms with ablation studies
on different design decisions, where we highlight the effectiveness of routes
as guidance for planning agents and the ability of RL to overfit against
simulated agents
Estimation of the national disease burden of influenza-associated severe acute respiratory illness in Kenya and Guatemala : a novel methodology
Background:
Knowing the national disease burden of severe influenza in low-income countries can inform policy decisions around influenza treatment and prevention. We present a novel methodology using locally generated data for estimating this burden.
Methods and Findings:
This method begins with calculating the hospitalized severe acute respiratory illness (SARI) incidence for children <5 years old and persons β₯5 years old from population-based surveillance in one province. This base rate of SARI is then adjusted for each province based on the prevalence of risk factors and healthcare-seeking behavior. The percentage of SARI with influenza virus detected is determined from provincial-level sentinel surveillance and applied to the adjusted provincial rates of hospitalized SARI. Healthcare-seeking data from healthcare utilization surveys is used to estimate non-hospitalized influenza-associated SARI. Rates of hospitalized and non-hospitalized influenza-associated SARI are applied to census data to calculate the national number of cases. The method was field-tested in Kenya, and validated in Guatemala, using data from August 2009βJuly 2011. In Kenya (2009 population 38.6 million persons), the annual number of hospitalized influenza-associated SARI cases ranged from 17,129β27,659 for children <5 years old (2.9β4.7 per 1,000 persons) and 6,882β7,836 for persons β₯5 years old (0.21β0.24 per 1,000 persons), depending on year and base rate used. In Guatemala (2011 population 14.7 million persons), the annual number of hospitalized cases of influenza-associated pneumonia ranged from 1,065β2,259 (0.5β1.0 per 1,000 persons) among children <5 years old and 779β2,252 cases (0.1β0.2 per 1,000 persons) for persons β₯5 years old, depending on year and base rate used. In both countries, the number of non-hospitalized influenza-associated cases was several-fold higher than the hospitalized cases.
Conclusions: Influenza virus was associated with a substantial amount of severe disease in Kenya and Guatemala. This
method can be performed in most low and lower-middle income countries